-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Nettrace compression and multi-process support #1258
base: main
Are you sure you want to change the base?
Conversation
mjsabby
commented
Sep 5, 2020
- Adds C# implementation for LZ-based compression and decompression that is used in BPerf File Format (the file format we're intending to replace)
- Adds a flag for the compression type
- Adds next 4 bytes to header, this is the decompressed size.
cc @noahfalk |
@mjsabby, are there corresponding runtime changes for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static unsafe ArraySegment<byte> Decompress(ArraySegment<byte> input, int decompressedSize) | ||
{ | ||
byte[] output = new byte[decompressedSize * 2]; | ||
fixed (byte* inputPtr = &input.Array[input.Offset]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid pinning these and work with indexes or Spans rather than raw pointers? I know this code is hardly the only offender but one of things I am hoping to do with the EventPipeEventSource is convert so it doesn't use any unsafe pointer manipulations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compression code does check out of bounds and is likely to be a hot path. I've removed all the compression code, and only kept decompression code so it is easier to audit if that helps. Let me know.
@@ -887,7 +887,8 @@ public unsafe void FromStream(Deserializer deserializer) | |||
internal enum EventBlockFlags : short | |||
{ | |||
Uncompressed = 0, | |||
HeaderCompression = 1 | |||
HeaderCompression = 1, | |||
EventBlockULZCompression = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noahfalk If you could do a once over to see if this is the direction you wanted ... |
bool isULZCompressed = (flags & (ushort)EventBlockFlags.EventBlockULZCompression) != 0; | ||
|
||
int eventBlockSize = eventBlockData.Length; | ||
if (isULZCompressed && headerSize >= 24) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the isULZCompressed flag and headerSize don't match I would error similar to the checks above (Assert + return). We should probably have a better error handling scheme, but it at least marks where the errors are detected in the code and prevents continued parsing.
At the moment this if block would not run but also the if(!isULZCompressed) block below would not run, presumably leaving the parser in a broken state.
@@ -1388,7 +1409,8 @@ enum CompressedHeaderFlags | |||
ActivityId = 1 << 4, | |||
RelatedActivityId = 1 << 5, | |||
Sorted = 1 << 6, | |||
DataLength = 1 << 7 | |||
DataLength = 1 << 7, | |||
ProcessId = 1 << 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flags field is a single byte, no room to set the 9th bit : ) I'd suggest changing bit 2 into CaptureThreadPidAndSequence and encoding the process id as the VarInt64(current_event_proc_id - previous_event_proc_id). This means:
Bit 2 is clear (probably most events) -> proc id is unchanged from last event, no additional data encoded in the header
Bit 2 is set, encoded process id field is single byte 0 -> process id is unchanged from last event, 1 additional byte used in header. This case happens every time two adjacent events are logged from different threads in the same process.
Bit 2 is set, encoded process id field is non-zero -> process_id = prev_event_process_id + ReadVarInt64(encoded_proc_id_field). This occurs whenever adjacent events have different PID. Encoding size is variable depending on magnitude of proc id, probably 2 bytes.
We may also want an optimization that single-proc traces never encode a process id regardless if bit 2 is set. This ensures the runtime produced traces don't regress in size.
public static void ReadFromFormat(int version, byte* headerPtr, bool useHeaderCompresion, ref EventPipeEventHeader header) | ||
{ | ||
switch (version) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only need to add one new major version? The current shipped version of the format is 4 and the new one would be 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit wary of mixing v4 and v5 functionality and having a single implementation for both. I realize this might make for a little code duplication. Presumably any feature work we do in the runtime during .NET 6.0 that would also necessitate a version increase will get rolled into v5 as well. This could mean that we need to bring back the v4 version of the code later anyway if the delta between v4 and v5 becomes large enough.
|
||
if (run == 7) | ||
{ | ||
run += (int)DecodeMod(ref ip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With bad data I assume its possible that ip == ipEnd, this would read outside the buffer.
|
||
if (len == 15 + MinMatch) | ||
{ | ||
len += (int)DecodeMod(ref ip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another buffer overrun possible here? (ip == ipEnd)
return -1; | ||
} | ||
|
||
int dist = ((token & 16) << 12) + Unsafe.ReadUnaligned<ushort>(ip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another buffer overrun possible here? (ip == ipEnd)